Goto

Collaborating Authors

 layer normalization






Theoretical

Neural Information Processing Systems

The question of if and how rank collapse affects training is still largelyunanswered, anditsinvestigation isnecessary foramore comprehensive understanding ofthisarchitecture.




A Operator integration

Neural Information Processing Systems

Current operator library with quantized operators is not feasible for vision transformer inference because of the specific operators including the GeLU activation and layer normalization. We provide the details of how to approximate the square root operators in Algorithm.1. B.2 Hypernetwork Search Space We set hypernetwork search space with the following factors. 1 1. We use a population size of 50.


3309b4112c9f04a993f2bbdd0274bba1-Paper-Conference.pdf

Neural Information Processing Systems

A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality. So far, most techniques use either traditional metrics based onsubgraph counting, ortherepresentations of randomly initialized Graph Neural Networks (GNNs). We propose using representations from contrastively trained GNNs, rather than random GNNs, and show this gives more reliable evaluation metrics. Neither traditional approaches nor GNN-based approaches dominate the other,however: we giveexamples of graphs that each approach is unable to distinguish. We demonstrate that Graph Substructure Networks (GSNs), which in a way combine both approaches, are better at distinguishing the distances between graph datasets.


Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization

Neural Information Processing Systems

This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and scalability.However, Transformer architectures, which tokenize input data (via patchification), face a trade-off between visual fidelity and computational complexity due to the quadratic nature of self-attention operations concerning token length.While larger patch sizes enable attention computation efficiency, they struggle to capture fine-grained visual details, leading to image distortions.To address this challenge, we propose augmenting the **Di**ffusion model with the **M**ulti-**R**esolution network (DiMR), a framework that refines features across multiple resolutions, progressively enhancing detail from low to high resolution.Additionally, we introduce Time-Dependent Layer Normalization (TD-LN), a parameter-efficient approach that incorporates time-dependent parameters into layer normalization to inject time information and achieve superior performance.Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, where DiMR-XL variants surpass previous diffusion models, achieving FID scores of 1.70 on ImageNet $256 \times 256$ and 2.89 on ImageNet $512 \times 512$. Our best variant, DiMR-G, further establishes a state-of-the-art 1.63 FID on ImageNet $256 \times 256$.